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ABSTRACT 



The space-based gravitational wave detector LISA 
will observe in the low-frequency gravitational-wave 
band (0.1 mHz up to 1 Hz). LISA will search for 
a variety of expected signals, and when it detects a 
signal it will have to determine a number of param- 
eters, such as the location of the source on the sky 
and the signal's polarisation. This requires pattern- 
matching, called matched filtering, which uses the 
best available theoretical predictions about the char- 
acteristics of waveforms. All the estimates of the 
sensitivity of LISA to various sources assume that 
the data analysis is done in the optimum way. Be- 
cause these techniques are unfamiliar to many young 
physicists, I use the first part of this lecture to give 
a very basic introduction to time-series data analy- 
sis, including matched filtering. The second part of 
the lecture applies these techniques to LISA, snowing 
how estimates of LISA's sensitivity can be made, and 
briefly commenting on aspects of the signal-analysis 
problem that are special to LISA. 
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1. INTRODUCTION 



LISA, or any other space-based detector operating in 
the low-frequency regim e, will take data con tinuously 
for a number of years ( [Bender et al. 1999 ). It will 
look simultaneously at the whole sky, with varying 
sensitivity in different directions. It will record all 
the waves of sufficient strength in its observational 
frequency band that pass through the solar system 
during the lifetime of the mission. But these waves 
will generally not be strong enough to be visible in 
the time-series data. The output of the detector will 
simply look like pure noise most of the time. 



To recognize and extract the signals, one must ap- 
ply special computer operations, called filters, to the 
data to remove the noise and retain the signal. Fil- 
ters are constructed from theoretical expectations of 
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what the waveform will look like. All our predictions 
of the performance and sensitivity of LISA assume 
that the data analysis systems implement filtering in 
an optimum way. It is therefore impossible to un- 
derstand how the LISA sensitivity is arrived at with- 
out knowing the elements of the theory of time-series 
data analysis, and especially of filtering. 

A filter that is known to most physicists, and which 
provides a simple example of the principle of filter- 
ing, is the Fourier transform. If one expects the data 
to contain a simple signal of constant but unknown 
frequency, then the Fourier transform is the ideal fil- 
ter to use to find it. Even if, in the time-series data, 
the signal amplitude is too small to be seen against 
noise, the Fourier transform allows one to identify the 
signal: it rearranges the data in such a way that the 
noise is spread over the whole spectrum but the power 
in the signal is concentrated at one frequency. After 
applying this filter, the ratio of the signal's Fourier 
amplitude to the standard deviation of the noise at 
nearby frequencies can be very large. Moreover, the 
filter has given us additional information about the 
signal: its amplitude, frequency, and phase, all of 
which can be regarded as parameters of the expected 
signal waveform that we wanted to determine. 

Unfortunately for LISA, we do not expect its data 
to contain gravitational waveforms of constant fre- 
quency. We do expect constant-frequency sources, 
such as binary systems, but the motion of the detec- 
tor as it orbits the Sun imposes a Doppler shift on the 
incoming wave. In the data itself, no physical signal 
will remain of constant frequency. Moreover, many 
other sources, such as the coalescences of supermas- 
sive black hole binaries, produce more complicated 
signals whose frequencies depend on time in a pre- 
dictable way. Therefore, the filters that we will need 
for LISA must be more sophisticated than the simple 
Fourier transform. Their construction and applica- 
tion is the subject of this lecture. 

I will build on previous lectures at this school, which 
have covered the theory of general relativity and of 
gravitational waves, the design of the LISA detector, 
and the nature of the likely sources of low-frequency 
waves. As in my previous lecture on sources, I will as- 
sume that many students have no previous expertise 
in this area, so in the first part of the lecture I will 
introduce the fundamentals of signal-analysis princi- 
ples, including matched filtering. In the second part 
I will describe how signal-analysis principles are used 
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to estimate the sensitivity of LISA to various sources, 
and I will highlight some special signal-analysis prob- 
lems that LISA will face. 



2. ANALYSIS OF TIME-SERIES DATA 



2.1. Detection is a Statistical Problem 

The data coming from LISA will be a single time- 
series of data values, sampled uniformly in time. For 
the purposes of this lecture, I will assume that this 
data set has been calibrated and reduced to a data 
stream that directly corresponds to the amplitude h 
of a gravitational wave that would be required to pro- 
duce the observed strain S£/£ in the arms of either of 
the two LISA interferometers. We do not need here 
to worry about instrumental issues, like the extrac- 
tion of the data from the various signals within the 
detector, or the conversion of the data from measure- 
ments of changes of arm-length to measurements of 
gravitational wave amplitudes. 

The primary problem of the detection of gravitational 
radiation consists of identifying a gravitational wave- 
form in a noisy signal. The noise can be intrinsic 
to the detectors itself (vibrations or photon-counting 
fluctuations) or it can be external interference (a 
gravitational wave background due to cosmic sources 
or local binaries). If nothing is known ahead of time 
about the signal, then the only way to detect it is 
to look in the time-series data for signals that are so 
strong that one would not expect noise to duplicate 
them during the observation period. 

It is possible to recognise much weaker signals if we 
know what to expect. However, even if the form of 
the wave is known, it may still depend on a number 
of parameters whose values in any particular event 
are unknown. The amplitude and polarisation of the 
wave are always unknown parameters. Other typical 
parameters include the time-of-arrival of a signal that 
has a limited duration, or the masses of the stars in a 
binary system that emits radiation. Since the wave- 
form will be different for different parameters, finding 
the signal usually involves estimating its parameters 
at the same time. This is where much of the scientific 
return of the LISA mission will be. 

Because all data streams contain random noise, the 
detection of a signal is always a decision based on 
probabilities. It is never possible to be 100% sure of 
a detection: there is always a chance that the noise 
conspired in some random way to look very convinc- 
ingly like an expected signal. The aim of detection 
theory is to assess this probability, so that a confi- 
dence level can be attached to any claim of a detec- 
tion. 

From the point of view of the data stream, there is 
nothing special about a signal: the data are just a se- 
ries of random values. The experimenter must decide 
what pattern in the data stream constitutes a signal 
and what does not. Every signal should be defined 
before the analysis begins. In some cases, the defi- 
nition is very precise, such as that the signal must 
look like that expected from a coalescing black-hole 
binary, within some range of parameters. In other 



cases, the definition could be more inclusive, such as 
looking for a signal with an arbitrary waveform that 
is so strong in the time-series noise that one would 
not expect it to arise by chance more than once in, 
say, 10 4 years of observing. The more precise the 
definition of the signal, the less likely it is that ran- 
dom noise will duplicate it, and so the deeper into 
the noise one can go to find it. Experimenters should 
agree on the set of signal they are looking for before 
they analyse the data. 

One analyst's signal could be another's noise. A grav- 
itational wave hunter would love to find the coalesc- 
ing binary waveform, but would throw away features 
in the data that are clearly of instrumental origin. 
The instrument-builder, on the other hand, might 
look precisely for such features as clues to the be- 
haviour of the detector. Even the noise could be a 
signal: we expect LISA to see a random background 
of waves from galactic binaries that is larger at some 
frequencies than the intrinsic detector noise. 

It is therefore important for the student approach- 
ing this subject for the first time to grasp the notion 
that signals are defined by the analyst, not by the 
experiment. The analyst simply looks for a pattern 
that seems close enough to his or her pre-conceived 
notion of a signal. Any such identification has an 
element of chance in it, and one must assess one's 
confidence that the signal really did arise from an 
external source rather that something inside the de- 
tector. The less information one uses in defining the 
expected signal (as when one is looking for new and 
unexpected sources), the easier it will be for noise to 
fit the definition, and the stronger must be the signal 
to be convincing. 



2.1.1. An example of signal identification pitfalls 

It may be helpful to illustrate the difference between 
an experiment that defines its signal before looking at 
the data and one that does not with a trivially simple 
problem. It is often observed that any "random" se- 
quence of numbers contains many patterns: what is 
different between random and deterministic numbers 
is that one cannot predict the patterns in the random 
number sequence. To be concrete, suppose a student 
uses a computer program to generate a sequence of 
10 integers, each one chosen randomly (and with uni- 
form probability) from the integers 0..9. The output 
of the first run of this program is the sequence (3, 7, 
0, 1, 2, 3, 0, 9, 6, 1). The student notices right away 
that the sequence contains the subsequence (0, 1, 2, 
3) in that order, beginning at the third integer in his 
list. He did not expect this. Is it significant in some 
way? 

He calculates that the probability that this sequence 
will appear, starting with the third element in the 
list, is 10~ 4 . But he reasons that this is not a fair as- 
sessment of the chances that the first sequence would 
surprise him: he would have been equally surprised 
had the sequence started at the second integer, or 
the fifth. Since there are seven possible starting posi- 
tions that would allow the whole sequence to appear, 
the the chance that this subsequence would appear 
somewhere in the data is 0.0007: this will happen 
about once in every 1400 experiments if the random- 
number generator is working correctly. This tempts 
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the student to suspect an error in the program that 
generates the numbers. But is this yet the true signif- 
icance? He reasons that he would also have noticed 
something unusual if the sequence had been (1, 2, 3, 
4), or (9, 8, 7, 6). When he calculates the chances of 
getting any such sequence in the data, then he gets 
about 0.01. He begins to think maybe he was just 
unlucky that this happened on the first run of his 
program. But he really does not know how to assess 
the significance of this event. 

In this case the solution to his decision problem is 
simple: he should run the program again, this time 
choosing to look for the "signal" (0, 1, 2, 3) some- 
where in the ten integers produced. Now the prob- 
ability of obtaining this signal from a good random- 
number generator really is 0.0007. So if the signal 
appears again in the next sequence, it is very prob- 
able that the program is defective. If 1400 different 
experiments like this were performed with the same 
result, then in 1399 of them the experimenter would 
be justified in throwing away the program and buy- 
ing a new one. Of course, it is still possible that our 
student will be the unlucky 1400th who arrived at 
this result at random! 

What happens if something analogous occurs in the 
LISA analysis? Particularly, what if one has found an 
unusual and unexpected "signal" using all the LISA 
data, so the option of going back and making another 
measurement is not there? Or what if the source of 
the "signal" could be transient, so that further data 
taking would not be expected to see another such 
event anyway? In order to minimise the chance that 
this will happen with LISA (or with other such exper- 
iments), it is important to define before the experi- 
ment what data sequences will be regarded as signals, 
so that their probabilities can be unambiguously cal- 
culated (like the number 0.0007 above). Even if the 
criterion is crude, to allow one to discover unexpected 
waveforms, one must define the criterion and calcu- 
late the chances of random noise satisfying it. If this 
is not done, then one runs the risk of not being able 
to decide the significance of an observation. 



2.1.2. Literature 



There is a big literature on signal-analysis, much 
of it in electrical engineering. At least in part be- 
cause this problem is very similar to the problem of 
the detection of a target by radar and of the esti- 
mation of target parameters like range and velocity, 
the theory of detection and estimation is well un- 
derstood. Standard textbooks on the theory of sig- 
nal detection include a num ber of monographs (|Hcl- 
jtrom 1968t |Van Trees 1968| ; |Whalen 197% |Wainstcm 
fc Zubakhov 1962 ). These texts are all oriented to 
wards applications to radar. The first introduction 



to signal-analysis theory aimed at the detection of 
gravitational w aves by bro adband detectors is a re- 



view article by Davis 1989. This article has the ad- 



ditional advantage of being written in the contempo 
rary language of stochastic processes. The techniques 
of time-series analysis (Fourier transforms, etc.) are 
fundamental tools of the trade. There are ma ny 



introductory textbooks, such as Bracewell 1978. I 



2.2. Signals in Noise 

As a result of noise a datum from the detector is 
a value of a certain random variable. Since we take 
measurements at regular intervals of time, the data 
from a detector form a sample of a certain (discrete) 
stochastic process. Regardless of whether a signal is 
present or not, the data will still be stochastic, but 
the presence of a signal will affect the probability 
distribution of the stochastic process. 

We will use the following notation: Xj denotes a sam- 
ple of the experimental data (the random variable) , 
and x = (xq, X\, xjv-i) denotes the entire data set 
of N samples, which is called the stochastic process. 
The expected signal will be denoted by hj and the 
noise (which would be the output in the absence of a 
signal) is called rij. Sums, where no limits are indi- 
cated, always run from to N — 1. 



2.2.1. Detection 



In general, the distribution of data values Xj will be 
described by some probability density function (ab- 
breviated pdf) p(x): the probability that Xj lies be- 
tween a value x and a value x + dx is p(x)dx. The 
probability for the whole set of values (process) is 
the joint pdf of the process. If the points are statisti- 
cally independent, the joint probability is the product 
of the individual probabilities. But data values are 
not always (in fact, not usually) statistically indepen- 
dent. 

Since the output of the detector depends on whether 
a signal is present or not, the pdf of the data must 
be different in the two cases. If there is no signal 
we call the joint pdf po( x ); if the signal is present 
we call it Pi(x). We will see below an example of 
how to calculate these pdf's. To decide which pdf 
applies to a particular measurement x, we have to 
devise a rule called the test, which decides whether 
the observed data were more likely to come from a 
distribution with pdf po or p\ . It does this by dividing 
the range of possible values of x into two sets R and 
its complement R' in such a way that we decide the 
pdf is pi(x) if x £ R and the pdf is po( x ) if x £ R' . 

The detection probability Pr){R) is then given by the 
probability that a data set x that contains the signal 
will pass our test: 



P D {R) = [ pi(x)dx. 

JR 



(1) 



The false alarm probability Pp(R) is the probability 
that a data set that contains no signal passes our 
test: 

(2) 



P F {R) = / p (x)dx. 

JR 



have previously reviewed these issues princi pally in 



the c ontext of gro und-based interferometers (Schutz 
T99lj; ftchutz il997|). 



2.2.2. Confidence and significance of a detection: 
the null hypothesis 



As we have said above, detection is always proba- 
bilistic: one can never be 100% certain that a signal 
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is present. There is always the possibility that the 
noise has conspired to look like the desired signal. 
The only legitimate statement of the outcome of an 
experiment is to give a probability for the signal to 
be present. 

What is often done is to quote the false-alarm prob- 
ability, which is the probability that the experiment 
would have had the same outcome even if no signal 
had been present. This is sometimes called the null 
hypothesis. Such calculations are delicate, since the 
probability of getting a false alarm depends on how 
many times the experiment is performed. When one 
is looking for a family of signals that depends on a 
parameter, such as the masses of the stars in a binary 
system, then each search through the same data set 
for each distinguishable value of the parameter is a 
different experiment, and the probability of getting 
a false alarm increases directly with the number of 
parameter values. 

Sometimes the number of experiments is not easy to 
assess. When physicists look through data and find 
unexpected results, which are not part of a previously 
defined family of signals, it may be impossible to de- 
cide just how big the parameter space was, how big 
the false-alarm probability was. We illustrated this 
point in the introduction above. For a real-life dis- 
cussion of these issu es in the context of grav itational 
wave detection, see Dickson fc Schutz 1995 and ref- 
erences therein. For this reason, it is important for 
experimenters to define what will be accepted as a 
signal before they look at the data. 

The false alarm probability is called the significance. 
The confidence level is one minus this. It is clear that, 
even in cases where the number of experiments is 
well-defined, the calculation of the false-alarm prob- 
ability requires a full understanding of the properties 
of the noise in the experiment. Characterising the noise 
is a vitally important part of any experiment. 



2.2.3. Systematic errors 



Until now we have assumed that the data stream con- 
sists of random noise plus a possible signal. But it 
could be influenced by other effects too. For example, 
it could contain a coherent "signal" that looks like an 
expected astronomical gravitational wave but really 
comes from an instrumental source. Or the exter- 
nal disturbances that set the low-frequency limit on 
the LISA band might contain occasional large events 
associated with fluctuations in the solar wind or ra- 
diation pressure. This is an unavoidable problem in 
data analysis, and there are no general prescriptions 
for dealing with it. One must build in as many house- 
keeping data streams and consistency checks as pos- 
sible, to give confidence that systematics play no im- 
portant role. Again, understanding all the details of 
the instrumental noise will be essential to the success 
of LISA. 



approach. In this approach we seek a test that max- 
imises the detection probability subject to a preas- 
signed false alarm probability Pf(R) = a*. Given 
that gravitational waves have not yet been detected, 
an important consideration at least at first will be 
to be sure that one has detected one. By choos- 
ing the false-alarm probability a sufficiently small, 
one ensures that the chances of falsely identifying a 
noise event as a gravitational wave are as small as 
one wants. In this way one misses many true gravi- 
tational wave events, but one has considerable confi- 
dence that those that are identified are real. 

The solution for the "detection region" R in the 
Neyman-Pearson approach is given in terms of the 
likelihood ratio, defined by 



A(x) 



Pi( x ) 
Po(x)' 



(3) 



If the observed data set x is more easily produced 
when a signal is present than when absent, then this 
will be larger than 1. The larger this is, the more 
likely it is that a signal is present. If we let Ao be the 
likelihood threshold associated with the probability a, 



Pf[A(x) > A ] =a, 



(4) 



then the detection region of the space of all possible 
sample sets is 



R = {x : A(x) > A }. 



(5) 



Thus if for a particular observed set x we find that 
A(x) > Ao, then we say that the signal is present; 
otherwise we say that the signal is absent. 

In most of the following we assume that the sampled 
values of the signal hi — h(ti) are a deterministic 
function of time that we expect to find in our data, for 
example because calculations of gravitational wave 
sources have revealed it. We will have to treat sepa- 
rately the case where the "signal" itself is noise, such 
as a background generated by the Big Bang. 



2.2.5. Characterisation of noise 



We assume now that the noise 



n(tj) 



the 



detector is a zero-mean Gaussian stochastic process. 
In fact, real noise is not completely Gaussian, and 
this needs to be taken into account when systems are 
designed for LISA. But for this introductory lecture, 
it will be best to assume the simplest model of noise. 
The main way of characterising this noise is by its 
autocorrelation function Kij\ 



Kij = E[niTij], 



(G) 



2.2.4. Neyman-Pearson test: the likelihood ratio 



The most appropriate way to test for the detection of 
gravitational waves seems to be the Neyman-Pearson 



where E[ ] denotes the expectation value. If the noise 
is zero-mean Gaussian, then the autocorrelation func- 
tion completely determines the statistics of the pro- 
cess. 
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2.2.6. Matched filtering of Gaussian noise 

It can be shown for such a process that the logarithm 
of the likelihood ratio is given by 

lnA[x] = ^x k q k - -^hkqk, (7) 

k k 

where q is called the matched filter for the expected 
signal h. It is defined to be the solution of the equa- 
tion 

hj = ^2K jk q k - (8) 

k 

This is a matrix inversion for q, but if there are many 
data points the inversion may not be easy to per- 
form. We will see below how to solve this equation 
for q more easily in the case of stationary noise. The 
waveform h that is used to generate q is called the 
template for q. 

Notice that the solution for q does not depend on 
the observed data set x, but only on the statistical 
properties of the noise as given by Kij . Once one has 
solved for q, one then computes the logarithm of the 
likelihood ratio from Equation (m) for any particu- 
lar data set. If this exceeds the threshold In Ao, one 
claims a detection. 

In fact, we can see that the likelihood ratio A[x] de- 
pends on the particular set of data x only through 
the sum 

G = ^x k q k . (9) 

k 

This sum G is called the detection statistic for the 
signal h(tj) which generates the filter q(tj) through 
Equation rtq) . The second term in Equation (pi) is in- 
dependent of the data set, so we do not include it in 
the detection statistic. The statistic is a correlation 
between the data x and the filter q. 

Since we want to maximise the likelihood ratio, we 
want to maximise the detection statistic: if G is 
larger than one expects by chance, then there is a 
corresponding likelihood that the data contains the 
signal h. It is important, therefore, to note that the 
detection statistic is a dimensionless number: if x 
has units km, say, then the autocorrelation K has 
units km 2 , the filter q defined in Equation (||) has 
units km -1 , and the statistic G is dimensionless. We 
will see later that significant detections are associated 
with observed values of G that are large compared to 
1. 



2.3. Data Analysis in the Presence of Stationary 

Noise 

A very important special case is that of stationary 
noise, which is defined as a process which is indepen- 
dent of the origin of time, i.e. of when the experiment 
started. The autocorrelation then depends only on 
the difference between the times t and t': there exists 
a Ck such that 

K ij = C i - j . (10) 

LISA will normally observe long-lived sources, so the 
assumption of stationarity is a strong one: most in- 
struments change their sensitivity over time. Again, 



for the purpose of this lecture, this assumption is a 
good first step. But a full data analysis program for 
LISA must address the best way to achieve long-term 
sensitivity as the detector changes. 

The great advantage of stationary noise is that, if the 
noise is stationary and if the whole of the signal is 
included in the interval [0, T], then Equation (j3j) can 
be solved explicitly for the filter by Fourier transform 
techniques. To see how this happens we must first 
review some part of the theory of discrete Fourier 
transforms. 



2.3.1. Definition of the discrete Fourier transform 



Given any data series Xj of N points numbered from 
Q to N — 1, we define its discrete Fourier transform 
(DFT) x k by 

JV-l 

x k = J2 Xje- 2 ™ jk / N . (11) 

3=0 

Its inverse is 

*i = ^X>e** fc/ ". (12) 

k=0 

The factor of 1/N must not be forgotten. There are 
various conventions for these definitions, according to 
whether 1/N is placed in the Fourier transform or in 
its inverse, or is shared between them. 

If the index j counts sampled points in the time 
domain, the index k is the frequency index. Since 
a given real signal has Fourier components at both 
positive and negative frequencies (as in cos(2mft) = 
^(exp(2mft) + exp(— 2m ft), which has components 

at frequencies ±/), both positive and negative fre- 
quencies are present in the Fourier transform, with 
complex-conjugate amplitudes. With our conven- 
tions, the indices k = to k — N/2 denote the posi- 
tive frequency components. The negative frequencies 
are mapped by periodicity to larger values of k: the 
negative component associated with a positive fre- 
quency k is at N — k. 



2.3.2. Relation to the continuous Fourier transform 

Physically, one usually likes to think in terms of the 
continuous Fourier transform of a data stream x(t) 
that our discrete values Xj were sampled from: 

/oc 
x{t)e^ lit dt. (13) 
-co 

I will use a notation from now on that the tilde ~ de- 
notes a Fourier transform, and the type of transform 
is given by the argument: if the argument is the con- 
tinuous variable / it is the continuous transform just 
defined, and if the argument is a discrete index like 
k it is the DFT. 

Here we have continuous variables for time t and fre- 
quency /. The first correspondence we must make is 
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between them and their discrete counterparts j and 
k. We shall always assume that the time-series data 
were sampled at a uniform rate, with a sampling in- 
terval At. Then we have tj = jAt. (For convenience, 
we take the origin of time t = to be at the first 
sampled data point.) The duration of the experi- 
ment is T — NAt. The frequency associated with 
a given Fourier amplitude £k must be deduced by 
equating the argument of the e xpo nential in the defi- 
nition of the DFT in Equation (Jl2|), 2mjk/N, to that 
in the continuous Fourier transform in Equatio n (|13[ ), 
2irift. Replacing j by its associated time tj/A'l , we 
find that 

h = k/{NAt) = k/T. (14) 

Therefore, the frequency resolution of the data is 
A/ = 1/T. The DFT essentially groups the Fourier 
components of the data into discrete Fourier bins of 
width 1/T. 

The second correspondence is to work out the re- 
lationship between values of the continuous Fourier 
transform and of the discrete version. The definition 
of the DFT does not contain an integration element 
corresponding to dt in the continuous FT, using in- 
stead a simple dimcnsionless summation. This inte- 
gration interval has width At, so we have 

x(f k ) -> Atx k . (15) 

The arrow reminds us that this is not an equality, 
but a correspondence. The full continuous Fourier 
transform uses the whole, infinitely long data set, 
while the DFT can only approximate it from a finite 
set of sampled values. 



2.3.3. Properties of the DFT 



The DFT is nothing more than a complex Fourier 
series representation of the finite set of sampled data 
values. As a Fourier series, it assumes periodicity: if 
one calculates the values of Xj for j > N from the 
inverse series Equation (|l2|), they will simply con- 
tinue the sampled data forward in a periodic way. 
If the underlying continuous data contains a signal 
whose spectrum as defined by Equation (U3) has a 
frequency /o that is exactly equal to one or the re- 
solved frequencies in the DFT, say /o = ko/T, then 
the DFT will have two non-zero elements at fco and 
N — fco • If the true signal frequency fo falls between 
two discrete frequencies, then the spectrum will be 
more complicated, but it will be dominated by the 
amplitudes at the two nearest discrete frequencies. 

The DFT is a linear operation, so in particular when 
applied to noisy data it commutes with the expecta- 
tion operation: 

N-l 
3=0 

There are a number of other important results that 
are easy to derive. One that we will need later is that 
if the data set is run backwards in time, the DFT is 
just the complex conjugate of the original. 



2.3.4. Convolutions using the DFT 

One of the most important operations one can per- 
form with Fourier transforms is the convolution. Note 
that with a data set that is finite, the only way 
to define a convolution is cyclically: the data wrap 
around from back to front as they are shifted, or 
equivalently they extend to indices outside the range 
(0, . . . , N — 1) periodically. Thus, the convolution of 
two discrete data sets {hj, j — 0, . . . , N — 1} and 
{Sij i = 0, . . . ,iV — 1} is defined as 

N-l 

9)3 = h j'9j-3' = (9 (16) 

j'=0 

The key result here is the convolution theorem: 

(hog^ =h k g k . (17) 

This allows us to perform convolutions by doing 
Fourier transforms. 

In the convolution, the index j' runs in opposite di- 
rections through the sets h and g. A correlation is 
defined as the same sum with the index on g reversed: 

N-l 

coir(h,g)j = ^ h j'9j'-j = coTr(g,h) N -j, (18) 
j'=o 

where again indices are extended outside the range 
(0, . . . , N — 1) by making the functions periodic. The 
equivalent of the convolution theorem is now 

(coTT(h,g)\=h k gl, (19) 

where the * denotes complex conjugation, which (as 
remarked above) changes the transform of gj into 
that of gN-j- 



2.3.5. Power spectrum and power spectral density 

To work with noise in the Fourier domain, we need 
to characterise its probability distribution at any fre- 
quency, that is we need to understand the spectral 
noise fik, the transform of the time-series noise rij. 
Of course, for zero-mean noise the expectation of the 
spectral noise amplitude will vanish. But its square 
will not, and the expectation of the squared modulus 
of the noise is (to within a factor of 2) the variance 
of the spectral noise distribution. 

Now, the power spectrum of a data set is defined as 
the squared magnitude of the Fourier transform: 

X k = \x k \ 2 ■ 

Many authors call this the periodogram, but we will 
reserve that term for something a little different (be- 
low). 

When the data is noisy, then we are interested in ex- 
pectations of the power. For completeness, we con- 
sider the expectation of the product of the Fourier 
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amplitudes of the noise alone at two (possibly) dif- 
ferent frequencies: 

N-l N-l 

P kk , = E [n k h* k ,] =J2Y. E K^'] e- 2m U k -i' k '» N . 

3=0 j'=0 

If the noise is stationary, one can use Equation @ 
and Equation ([h]) to obtain 

P kk , =J2C j - f e- 2m U k - j ' k 'V N . 
3,3' 

By changing variables so that j — j' is a summation 
variable and then doing the other summation explic- 
itlyQ, one can show that 

E[n k ni,]=S k 8 k , k , (20) 

where the power spectral density (psd) S k is given in 
terms of the time-autocorrelation Cj by 

N-l 

S k = Pkk (no sum on k) = N ^ Cj( ,-^jk/N 

3=0 

(21) 

The psd can be estimated from observed data by con- 
structing the power spectrum and using any of a vari- 
ety of methods to estimate from this the expectation 
value in Equation (|o|). One method is to average 
several successive data sets; another is to smooth the 
power spectrum by averaging groups of frequencies. 
In each case, one must convince oneself that the data 
are not "contaminated" by signals, that the expecta- 
tion is really the expectation of the noise. 

Equation (EQ) shows that, for stationary noise, there 
is no correlation between the noise at different fre- 
quencies, and at a given frequency the psd is the ex- 
pectation of the ordinary power spectrum. We say 
that the noise is white if the psd is flat, so that there 
is the same power at all frequencies. In this case, the 
inverse transform of Equation (El]) shows that Cj = 
for j 7^ 0. This means that, for white noise, data sam- 
ples in the time-series are not correlated with one an- 
other. Many sources of noise are intrinsically white, 
but most measuring instruments have different re- 
sponses at different frequencies, so that the noise in 
their outputs will generally be coloured. 



2.3.6. The matched filter for stationary noise 

Putting all these results together, we find that — by 
the convolution theorem — the solution q of Equa- 
tion (||) has the Fourier transform 

q k =N^. (22) 

This filter is called the matched filter for the expected 
signal h, and this equation is one of the most im- 
portant in practical terms in signal-analysis theory. 
Notice that it involves an inverse weighting with S k - 

One must use the identity " exp(— 2nijk/N) = NSjo, 
which is proved using the sum formula for a geometric series. 



if the noise is high at some frequency, the Fourier 
component of the filter at that frequency will be re- 
duced, so that it contributes relatively less to the 
convolution that produces the statistic G in Equa- 
tion (||) that searches for the signal. This shows how 
the detector's noise distribution limits its sensitivity. 

The detection statistic is linear in the filter g, so fil- 
ters are essentially equivalent if they differ by an over- 
all constant: gj and 0Qj will find the same signals. 
Of course, one has to take the filter normalisation 
into account when inferring the amplitude of the sig- 
nal, which we address below. Because of this, we will 
keep the normalisation indicated in Equation (p2|), 
with its factor of N. 

Notice that, if the noise is white, then the matched 
filter is just the expected signal. It is therefore help- 
ful conceptually to assume white noise when trying to 
understand what happens with filtering, even though 
in a real case one can usually not make this assump- 
tion. 

It should be noted that there are slightly different 
conventions for what one calls the matched filter. 
Some authors call q k (the complex conjugate) the 
filter, or its transform. Note that the time-domain 
version of this is essentially the signal running back- 
wards. This difference is just a matter of nomencla- 
ture: everyone will perform the same mathematical 
operations in order to filter for the signal. 



2.3.7. Parseval theorem 



There is an important relation between the Fourier 
transform and the original time-series data. This is 
called the Parseval theorem: 

N-l N-l 

£|z fe | 2 = iV£|*/. (23) 

fe=0 j=0 

By letting Xj be the sum of two variables yj + Zj , it is 
easy to deduce the more general form of this result: 

N-l N-l 

£ »fc*k = W (24) 

k=0 j=0 



An interesting deduction from this is what happens 
to the DFT of a long-lived signal when the observa- 
tion time T increases. The right-hand-side of this 
equation increases roughly as N 2 , so that means 
that the total power in the signal (the left-hand-side) 
also increases with N 2 (or T 2 ) . If the signal has only 
one significant Fourier amplitude (a narrow-band signal), 
then this amplitude increases linearly with time. 

Similarly, what happens to the noise psd S k when 
the observation time increases? If we take the expec- 
tation of the previous equation with x replaced by 
the pure noise amplitude n, then the left-hand side 
simply sums over Sk and the right-hand side adds up 
the expectation of the squares of the random values 
rij. On the right, for stationary noise, the expecta- 
tion for each j is the same, so the sum is proportional 
to the number of data samples N . In addition, there 
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is the explicit factor N on the right-hand-side, so the 
whole expression is proportional to N 2 . On the left- 
hand-side, for white noise (just to make the argument 
simple) all the values Sk are the same, and there are 
N of them. There are no extra factors of N there, 
so the expression is just N times the typical value 
of Sk for any fc. By equating this to the right-hand- 
side, we find that the power spectral density Sk itself is 
proportional to N , or to the duration of the experiment 
T. 



2.3.8. Nyquist theorem and aliasing 

Notice that, for data sampled with an interval At 
and therefore a sampling rate 1/Ai, the range of fre- 
quencies extends only up to fc = N/2, or = 1/2 At. 
So the maximum frequency that can be represented 
by data sampled at a given rate is half the sampling 
rate. Conversely, if we want to represent the spec- 
tral content of a signal up to a frequency /o, we must 
sample the data at least twice as fast. 

The full statement of this relationship is Nyquist 's 
theorem, which is that if an infinitely long continuous 
data stream contains no spectral power at frequen- 
cies above /o, then sampling that stream at a rate 
of 2/o is sufficient to capture all its information: the 
whole continuous stream, even values between sam- 
pled points, can be reconstructed from the sampled 
ones by using the inverse Fourier transform of the 
sampled points to generate time-series values at any 
value of the time, not just at a sampled time. Then 
if the original signal was band-limited in its spectral 
content as assumed, the reconstructed values will be 
exactly what the original continuous stream would 
have given had it been sampled there. 

This theorem is not very useful as it stands, because 
it assumes an infinitely long data set (N — oo). Ev- 
ery real experiment has a beginning and an end, and 
it generates a finite amount of data. If the data 
stream is long compared to the signal duration, how- 
ever, then one can expect the Nyquist theorem to 
apply. 

How does one know that the data are band- limited in 
the first place? The answer is that the experimenter 
usually arranges this by filtering out all frequencies 
above a certain value before the data are sampled. 
This is to avoid a phenomenon called aliasing. We 
shall define aliasing, and then see why it is something 
to avoid. 

Consider what happens to the DFT in Equation ( |Tl| ) 
if the data contains a signal h that has a frequency 
that is higher than the highest frequency in the DFT, 
N/2T. Suppose, for concreteness, its frequency is 
(N/2 + k )/T, Then at tj = jAt its value would be 

hj = exp (2mft j ) = exp (2ni(N/2 + k )j/N) . 

By replacing N/2 by N — N/2 and using the fact that 
exp(2mj) = 1, we find 

hj = exp (2m(k - N/2)j/N) . 

In other words, the signal at the high frequency of 
fco + N/2 has the identical values at the sampling 



times as one at the lower frequency of fco — N/2. If 
the first frequency was outside the frequency band 
represented by our data, the second one could be in- 
side it. In this case, the high-frequency signal would 
be mapped onto one in our band, and we would not 
know that it was "really" a higher frequency. If the 
second frequency is still outside our band, then we 
can do the calculation again, and map it to fco — 3N/2, 
and so on until it appears in our band. It follows that 
a signal of any frequency, no matter how high, will 
be aliased by sampling into a signal with an apparent 
frequency in the bandwidth of our DFT. 

One can think of aliasing in terms of per iodicity. The 
inverse Fourier transform, Equation ( |12| ) , shows that 
the data values are periodic functions of time, since 
the time index j enters the summation only in si- 
nusoids that are periodic in j with a period N. In 
just the same way, the Fourier transform itself, Equa- 
tion (p"l|), shows that the Fourier coefficients are peri- 
odic functions of frequency, since the frequency index 
fc enters the summation only in sinusoids that are 
periodic in fc with a period N. Aliasing says that, 
whatever the true spectrum of the continuous data 
is, the act of sampling it will add together frequency 
components from all the periods of the fundamental 
frequency band (-1/2 At, +1/2 At). 

This is bad in an experiment because, even if there 
are no interesting signals at higher frequencies, there 
is usually noise. If the data are sampled without 
filtering this high-frequency noise away, it will con- 
tribute to the noise in the observation band by alias- 
ing. So experimenters generally filter out this noise, 
ensuring that the signal that is sampled already has 
zero (or minimal) power at frequencies above half of 
the sampling frequency. This filtering is usually done 
by analogue means, using filter circuits in the elec- 
tronics, because once the data have been sampled it is 
too late to get rid of the high-frequency components. 



2.3.9. Interpolation in time and in frequency: the 
periodogram 

Once we have band-limited, sampled data, we can 
apply the Nyquist theorem (at least approximately, 
since our data set is finite in duration) and try to 
reconstruct the data values between the sampled val- 
ues. This interpolation is done just by using the in- 
verse Fourier transform at intermediate values: 

N-l 

x(a) = ±Y1 ^ 2mak/N > (25) 

where a is a continuous variable related to the inter- 
polation time t by t — aAt. Naturally, since the in- 
terpolation theorem is only strictly valid if the data 
set is infinite, when we use it on our finite set we 
must not use it outside the original duration of the 
experiment: the formula is for interpolation, not ex- 
trapolation! 

Because of the deep symmetry between the Fourier 
transform and its inverse, every theorem about func- 
tions of time and frequency has a counterpart with 
time replaced by frequency and frequency by time. 
We illustrated this in our explanation of aliasing in 
terms of periodicity in the frequency domain. The 
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interpolation result we have just quoted also has a 
useful counterpart that enables us to look for signals 
of any frequency, not just the discrete ones of the 
DFT. It goes like this. 

Suppose we have a continuous data stream that is 
time-limited: the data stream really is zero before 
and after the time of the experiment. Then the ana- 
logue of the Nyquist theorem is that it can be de- 
scribed completely by a discrete but infinite set of 
Fourier coefficients. We can use the data values to in- 
terpolate between the discrete frequencies using the 
Fourier transform with a continuously variable fre- 
quency index. Of course, this would be exact only if 
we used the infinite set of frequencies, and we have 
from our analysis only a finite set. But if the data 
does not contain much information at higher frequen- 
cies (because of our anti-aliasing filter) then we won't 
be too far wrong to use 

JV-l 

x{P) = J2 Xje- 2 ™ j V N , (26) 

3=0 

where (3 is a continuous variable related to the inter- 
polation frequency / by / = (3T . This, or more com- 
monly its associated power spectrum \xp\ 2 , is called 
the periodogram of the data. It can be used to look 
for narrow-band signals at frequencies between the 
discrete frequencies of the DFT. 

The periodogram defined here is a function of a con- 
tinuous variable (3. There is no easy way to compute 
it except by direct summation if a value is desired 
for some arbitrary value of /3. But if one wants a 
regularly sampled periodogram, i.e. a refinement of 
the DFT by interpolating m equally spaced points 
between the existing points of the DFT, then there 
is an efficient way. This is to make an FFT of a 
data set that consists of the original sampled points 
extended (padded) by mJV zeros. By analogy, this 
trick also works for the time-interpolation formula 
Equation (p5|). 

It should be noted that there is no uniformity in the 
use of the term periodogram in the literature. Often it 
is used synonymously with what we have called the 
power spectrum, which means it is just evaluated at 
the discrete resolved frequencies of the DFT. This is 
just a subset of the values of the periodogram as we 
have described it, but it is important to understand 
the nomenclature used by whatever author you are 
consulting. 



2.4. The Signal-to-Noise Ratio in Stationary 
Gaussian Noise 



2.4.1. Probability distribution functions 



When the noise is Gaussian, which is a good enough 
starting assumption for us, we can easily compute po 
and pi for the test based on computing the detec- 
tion statistic G, given in Equation (|J). As a sum 
of Gaussian-distributed random numbers, G itself is 
Gaussian. We need only compute its mean and vari- 
ance to determine the distribution. Since we assume 



stationary noise, we take the Fourier form of the ex- 
pression for G: 

G = Y1 x i q i = N^2 = XkH/Sk, (27) 

j k k 

where the second equality follows from the Parseval 
theorem Equation (|24|). 

If there is no signal, the data are pure noise (xj = rtj), 
and so the mean of G is 

E[G] = J2nnkfK/S k = 0, 

k 

for zero- mean noise. For the variance then we have 

k k' 

Using Equation (|2^) for the expectation value here, 
we define the variance by the symbol g? 2 ,: 

4 = E[G 2 ] (no signal) = £ J^L. (28) 

The pdf of G in the absence of a signal is the zero- 
mean Gaussian with this variance: 

absent: Po (G) = (2tt^)- 1/2 x 

exp(-G 2 /2d 2 ) (29) 

If the signal h is present, then the data consists of 
the signal plus the same type of noise as before. The 
expectation of the data is just the signal in the filter, 

E t G i = J2\ fik \ 2 / Sk = <& 

k 

, and the variance is the same as before, since the sig- 
nal is deterministic. So the pdf of G in the presence 
of the signal is 

present: pt(G) = (27rdg)" 1/2 x 

exp[-(G-rf 2 ) 2 /2d 2 ].(30) 

Notice that, as we observed before, G is dimension- 
less; therefore so is do. The expected value of G when 
a signal is present is d 2 , , while the standard deviation 
away from that value is do- Therefore, the signal-to- 
noise ratio (snr) of the filter output is do- Since G is 
linear in the data x, this is the amplitude signal-to- 
noise ratio of the filtered data. The power snr is the 
square of this: 

(amplitude snr) 2 = power snr = d^ = — — . 

k=o Sk 

(31) 

It is important to recall here that we have not allowed 
the expected signal h to depend on any parameters 
yet: even its amplitude is assumed fixed. The detec- 
tion problem assumes that either the signal is present 
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with exactly the assumed amplitude, or it is absent 
and the data are pure Gaussian noise. The filter q is 
constructed from a signal with the expected ampli- 
tude; and the snr of the filter output, do, also depends 
on the amplitude of the expected signal h. When fil- 
tering for a signal with a fixed amplitude, the size of 
G is not an indicator of amplitude: it is an indica- 
tor only of whether we think the signal is present or 
absent. 



This illustrates the great importance of having the 
correct information built into the signal template h 
that one looks for: if one assumes a fixed given am- 
plitude for the signal, one forces the statistics of the 
test to choose between a signal of that amplitude and 
nothing at all. If the signal really arrives with a differ- 
ent amplitude, then one is using an inappropriate test 
and can get the wrong answer, particularly for the 
significance of the detection. When dealing with as- 
tronomical sources, one does not usually know what 
amplitude to expect. Such a search requires a the use 
of a parameter for the amplitude, whose value has to 
be deduced from the data (and in particular fro m the 
size of G). We will treat this case in Section (2.5.) 
below. 



detections, then one can choose a larger threshold. 
For example, one could set Go = dp, insisting that 
the filter should reach the expected signal even in 
the presence of noise, which it can do, of course, only 
half the time. Then the false-alarm probability would 

plummet to 0.5 erfc(do/v / 2), which is 3 x 10~ 5 for 
our example of do = 5. But the detection proba- 
bility would go down to 0.5[1 — crf(0)] = 0.5, as we 
expected. 

The threshold is not chosen by a magic algorithm: it 
must be set at a level that gives one (really, the scien- 
tific community generally) sufficient confidence that 
the claim of a detection is correct. For the first grav- 
itational wave detection, this threshold will probably 
be set fairly high. When one is studying a popu- 
lation of sources, where one can tolerate some mis- 
identifications but one wants to get the overall pop- 
ulation correct, then the threshold can be somewhat 
lower. 

Normally thresholds are set on G at a level of sev- 
eral times do. In this case, it is useful to know the 
asymptotic approximation for the complement of the 
error function, 



2.4.2. Detection threshold 



erfc(z) = 1 — erf(z) 



The decision about whether the signal is present or 
absent is taken by looking at the likelihood ratio, or 
more conveniently its logarithm: 



lnA(G) = In 



Pi(G) 
Po(G) 



G 



1 



-di 



(32) 



The Neyman-Pearson test sets a suitable threshold 
on A, which therefore amounts to putting a threshold 
Go on the detection statistic. Then the false alarm 
and detection probabilities are given by, respectively, 



Pf 
Pd 



1 



erfc ( < Z° 
2 \V2d 

1 

2 



and 



1 — erf 



Go — d 
V2d 



(33) 
(34) 



where erf(x) is the error function and erfc(x) = 
1 — erf(cc) is its complement. (We adopt the con- 
vention that the error function is an odd function of 
its argument.) Thus, the probabilities governing the 
detection of a known signal buried in Gaussian noise 
are completely determined by the amplitude signal- 
to-noise ratio d. 

If the right-hand-side of Equation (p2[ ) is positive, 
then the likelihood ratio is larger than 1, and the 
observed data were more probable if the signal was 
present than if it was absent. This might be a 
reasonable threshold to impose on the test: Go = 
dp/2. The false-alarm probability for this threshold 
is 0.5 erfc (cio/2 3 / 2 ), and the detection probability is 

0.5[l-erf(-d /2 3/2 )] = 0.5[1 + erf(d /2 3 / 2 )]. For an 
amplitude snr of do = 5, we find for this threshold 
P F = 0.006 and P D = 0.994. 

If one wants to bias the test even more against false 
alarms, so that one can be really confident of any 



2.4.3. Increase of signal-to-noise ratio with time 
for long-duration signals 



A continuous, long-duration signal is narrow-band: 
it has relatively few significant Four ier co mponents 
in its spectrum. We saw in Section (|2.3.7.[) that the 
Fourier amplitude of such a signal increases linearly 
with observation time T. We also saw that the psd Sk 
increases linearly with T . The power signal-to-noise 
ratio d 2 , therefore, in Equation (31 ) also increases lin- 
early with time. Since we usuallydeal with amplitude 
signal-to-noise, it is useful to remember the rule of 
thumb: when observing a narrow-band signal of long du- 
ration, the amplitude signal-to-noise ratio increases with 
the square-root of the observing time. If LISA can de- 
tect a binary system in its first year with a threshold 
signal-to-noise ratio of 5, then after 4 years of observ- 
ing the signal-to-noise ratio will be 10. 

The derivation we gave of the dependence of Fourier 
ampl itudes on the length of the data set in Sec- 
tion ( |2.3.7j ) was purely mathematical and in addition 
depended on the particular way we normalised the 
Fourier transform. It is useful, therefore, to try to get 
a more physical feeling for this very important result. 
I can offer two ways of looking at the calculation. The 
first way is to look at the power in the signal, as in 
Parseval's theorem. Since the signal is coherent, its 
Fourier amplitude just increases linearly with observ- 
ing time. The noise behaves differently because it is 
incoherent. The noise amplitude performs a random 
walk. It accumulates only as the square-root of the 
time. Therefore the amplitude signal-to-noise ratio 
increases as the ratio T/VT=VT. 

The second way of looking at this is to concentrate on 
the intrinsic signal amplitude, not its Fourier trans- 
form. This does not change during the observation, 
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so it is a constant. But by performing a Fourier trans- 
form, we remove most of the noise that competes with 
the signal in the data stream by putting in Fourier 
bins (discrete Fourier coefficients) far from the sig- 
nal frequency. To be detected, the signal needs only 
compete with the noise in the frequency bin that its 
own frequency lies in. Now, as the observing time 
increases, the width of this frequency bin decreases, 
being just the frequency resolution l/T. It follows 
that the noise power in this bin also decreases linearly 
with time, since the power is uniformly distributed 
over the spectrum. Therefore, the power than the 
signal competes with decreases, and the r.m.s. am- 
plitude of the noise decreases as l/VT. This leads 
again to a signal-to-noise ratio that increases as the 
square-root of time. 

Another way of looking at this result, which has more 
general applicability, is that the snr of a signal with n 
cycles (even if they are of uneven length, as in a chirp- 
ing signal) is higher than the snr of a single cycle by 
roughly the factor y/n. (This assumes that the noise 
is white: clearly of some cycles are at a frequency 
where noise is high, then they should not be counted 
inn.) Because the snr of a single cycle cannot be sig- 
nificantly enhanced by filtering, the single-cycle snr 
is just the ratio of the signal's intrinsic amplitude h 
to the broad-band time-series noise of the detector. 
Therefore, the snr of a multi-cycle signal is compa- 
rable to that of a single-cycle signal with an effective 
amplitude of h e g = h^/n. One often sees sensitivity 
diagra ms for gravita tional wave sources (as in, for ex- 
ample, Thome 1987[ ) in which many kinds of sources 
are plotted on a vertical scale that is their effective 
amplitude. This allows a comparison of signal with 
noise for a range of sources on a single diagram. For 
sources that have several cycles, such a plot assumes 
that the matched filtering has been done correctly. 



2.4.4. Mismatch between template and signal 

So far we have assumed that the filter was con- 
structed from a template h that matched the signal 
exactly. But in realistic situations this match may 
be imperfect. Perhaps the theory of the source is 
incompletely understood, or perhaps one is using a 
crude representation of the signal to make the com- 
putational job easier. It is easy to see what the con- 
sequence of this mismatch will be. 

The variance of the detection statistic G will be the 
same as in the ideal case, Equation (|2^), where h rep- 
resents the template used to construct the filter. Sup- 
pose, however, that the signal arriving has a wave- 
form Hj . Then the actual value of G will be J^j HjQj ■ 
If this is close to the ideal G, then the earlier discus- 
sion will be substantially correct. 

This leads to an important observation: in the con- 
struction of a matched filter for a multi-cycle signal, 
it is more important to track the phase of the incom- 
ing signal than to match variations in its amplitude. 
The reason is that the correlation sum must come out 
right, and this requires qj to be positive when the Hj 
is positive and qj to be negative when Hj is negative. 
If the template and signal get out of phase at some 
time, they will make negative contributions to the to- 
tal sum, and the value of G (and hence the sensitivity 



of the filter) will drop quickly. On the other hand, 
if the amplitudes are not exactly right, the power in 
the correlation will be affected, but only modestly. 

This becomes obvious if we think of the Fourier trans- 
form as a matched filter for single-frequency sinu- 
soidal signals. If the frequency of the template sinu- 
soid differs from that of the incoming sinusoid by as 
little as the frequency-resolution of the observation 
l/T, the two waveforms will be orthogonal, and the 
correlation will be zero: this is mathematically the 
way the Fourier transform tells us what the true fre- 
quency is. So if the signal and template get out of 
phase by just one cycle in the observation time, the 
sensitivity of the test is destroyed. This can be the 
result of a very small fractional error in the frequency 
of the signal. A comparably small change in the am- 
plitude will have a negligible effect on the correla- 
tion. It is important to bear this lesson in mind when 
constructing templates for long-lived, multi-cycle sig- 
nals. 



2.4.5. Noise plots 

Beginners in signal analysis, and this includes many 
sophisticated theoretical physicists, are often bewil- 
dered when they see experimentalists plot the noise 
in their gravitational wave detectors with a vertical 
axis labelled "metres per root Hertz" or " m Hz -1 / 2 " . 
What can the square-root of a Hz mean? 

We are now at a place where we can understand this 
mystery. These graphs typically plot the noise ampli- 
tude in an experiment. Now, we have seen that the 
noise has power that is distributed smoothly over the 
spectrum, and this is measured by the psd. We would 
expect, therefore, that the psd should have dimen- 
sions of power (say, m 2 for an output x that is mea- 
sured in metres) per unit frequency. Our definition 
of the psd, Sk, does not have these units because we 
drop the dimension of time (or frequency) when we 
form the DFT. The correct conversion to the actual 
power per unit frequency at a particular frequency, 
which we shall call S(f), involves dividing Sk by the 
frequency interval A/ = l/T and (for our convention 
for the placement of the factors of N in the DFT) by 
N 2 . This gives [in the same sense of correspondence 
as in Equation (jl5|)] 

S (fk) - ^S h . (35) 

This is the noise power, but it is usually more infor- 
mative to plot the noise amplitude, in order to com- 
pare it with expected signal amplitudes. The noise 
amplitude is the square root of S(f) and is typically 
denoted by h(f): 

n(fk) = [S(h)} {1/2) - [(T/N 2 )S k f /2) . (36) 

Clearly this has dimensions of amplitude (metres or 
whatever) per root Hz. The conversion to a more 
physical quantity, such as metres, is to multiply the 
noise amplitude by the square root of the bandwidth 
that the signal occupies, because it is in this band 
that the signal fights the noise, as we discussed in 
the previous section. Then one gets an overall noise 
amplitude in metres, to compare with the signal. 
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2.5. Dependence on Parameters 
2.5.1. The maximum likelihood estimator 



In general we do not know everything about an ex- 
pected signal: we only know the form of the signal 
h(t) as a function of a number of parameters. With 
astronomical sources, the amplitude of the signal and 
its time of arrival are almost always unknown pa- 
rameters. There may be others. For example, the 
waveform from a coalescing supermassive black-hole 
binary will depend on two further parameters: the 
chirp mass (a certain combination of the masses of 
the members of the binary), and the phase of the 
wave at the time of arrival. In such a case in order 
to detect the signal we must also determine its pa- 
rameters. In the maximum likelihood method, the 
natural approach is to find values of the parameters 
of the family of templates that maximises the likeli- 
hood function for a given set of observed data. The 
set of optimum parameter values is called the maxi- 
mum likelihood estimator. 

Let 9 — {6*i, 6*2, ... , m } be the set of m unknown 
parameters of the signal h(t; 9), and let (as in Equa- 
tion (fy)) x stand for the observed data function Xj. 
As in The case of a completely known signal we con- 
sider two probability density functions pe (x; 9) and 
Po (x) depending on whether a signal with parameter 
set 9o is present or absent. From these we form the 
parameter-dependent likelihood ratio based on a test 
for whether the data contains a signal with a param- 
eter set 9, that may or may not be the same as the 
parameter values 9q that the signal really has: 



where the variance of the random variable G(9) is 



A[x;0,0o] 



Pg (x;6>) 
Po(x;#) ' 



(37) 



We define the maximum likelihood estimator (MLE) 9 
of the set of parameters theta to be the set that max- 
imises the likelihood ratio A[x; 9, 9o). Hopefully, this 
will not be too different from 9$. The MLE can be 
found by solving the set of simultaneous equations 



_d_ 
d9~ 



A[x; 9 X 







for j = 1, 



(38) 



Let us construct this function in the presence of sta- 
tionary Gaussian noise. The appropriate filter qk{9) 
is now parameter-dependent, 

q k (9) = h k (9)/S k . 

This family of filters produces a family of detection 
statistics 

k 

To apply the likelihood test, let us test for the pres- 
ence of a signal that matches our template family for 
some single set of parameter values 9q. The pdf for 
the data if there is no signal at all is the same as we 
had before, Equation (E9): 

absent: p Q (G(9)) = {2nd 2 g )- 1/2 x 

exp (~G(9) 2 /2d 2 e ) , (39) 



JV-l 



d 2 =V[G{9) 2 ] (no signal) = £ M^. (4 Q) 



fe=0 



When a signal is present with parameter set 9o, the 
pdf depends on both the signal parameters and the 
template parameters: 



present: 



p eo (G(9)) = (2nd 2 )- 1 / 2 x 
iG(9)-d 2 e ) 2 /2d 2 ] 



exp 



(41) 



The variance is the same as in Equation (HQ). The 
false-alarm and detection probabilities have the same 
form as for the non-parametric signal, Equation ( |33| ) 
and Equation (|34|). 

The likelihood ratio has the logarithm 



In A = 



Pe (x; 9) 
Po(x; 9) 



G{9) - -d 2 . 



(42) 



The MLE is the value of 9 that maximises this ex- 
pression for a given data set. 

Normally, one cannot maximise this as a function of 
a continuous set of values of 9. Instead, one estab- 
lishes a grid of discrete values of the parameters that 
covers the important range of parameter values with 
sufficient density not to miss signals with intermedi- 
ate parameter values. If there are m parameters, this 
grid is m-dimensional; since it may in some cases be 
necessary to test for hundreds or thousands of grid 
values along each dimension, the problem of finding 
the MLE in a multi-parameter template can be very 
demanding computationally. 

Once the maximum value of the likelihood ratio has 
been found on a grid of filters, the spacings between 
parameter values may be further refined to get as 
close an approximation to the MLE as desired. Note 
that the errors in determining various parameters can 
be correlated, and this can make some parameters 
more difficult to determine than others. These prob- 
lems have been extensively discussed in the gravita- 
tional wave field in relation particularly to coalescing 
binary signals. We will give references when we ad- 
dress this family of signals below. 



2.5.2. Amplitude as a parameter 

One parameter that is part of almost every signal 
LISA might search for is the amplitude. Most sources 
could be at a range of distances, so the amplitude is 
not predictable. Even known binary systems have 
uncertain distances, and even more uncertain incli- 
nations, so the amplitude of LISA's response will not 
be known in advance. Fortunately, the amplitude pa- 
rameter is straightforward to determine. We discuss 
it as a good concrete example of the general proce- 
dure outlined above. 

Let us suppose that we are searching for a waveform 
described by a function h(t), which has some suitably 
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chosen (arbitrary) amplitude. The incoming signal 
amplitude is A$: h(t;Ao) = Aoh(t). The family of 
templates that we use has an amplitude parameter 
A: hit; A) = Ah(t). Then the filter family will be 
q(t; A) whose values at the sampling times are 

q k {A)=AN~h k /S k . 
Then the detection statistic is 

G(A) = A^x k h* k /S k . 



For this parameter, the sum has to be done only once: 
we can treat the parameter dependence analytically. 
The variance of this statistic is 

d 2 A = A 2 d%, 

where g?q is given in Equation (B8|). The standard 
deviation of the filter output is thus Ado ■ 

To deduce the snr when a signal is present, take the 
expectation of G(A): 

E[G(A)}=Aj2^[xk]K/S k . 



Since the data stream contains a signal with ampli- 
tude A , this evaluates to 

E[G(A)} = AA d 2 . 

The snr is, therefore, this expectation divided by the 
standard deviation: 



A d . 



(43) 



It is satisfying that the snr depends on the intrinsic 
amplitude of the signal and not the amplitude of the 
parameter-dependent filter. This is because the filter 
amplitude also appears in the noise output of the 
filter. 

Now we can search through the filters to determine 
the one that matches the amplitude. This is not re- 
ally necessary in this simple case, since any filter can 
be used to deduce the snr and infer whether the like- 
lihood threshold has been exceeded. But the pro- 
cedure will illustrate what happens in more difficult 
cases. The logarithm of the likelihood function in 
this case is (from Equation (f42[)) 



In A = G{A) 



1 



2 ,2 



-A z d 



(44) 



The maximum of A occurs where this exponent reaches 
a maximum, which leads to the condition for the best 

parameter value A: 



k 



x k h* k /S k - Adl = 0. 



This can be solved for A: 



A = 



Y. k x k h* k /S k 



d 2 
o 



(45) 



It is important to realise that the variable A is a 
random variable. It has an expected value and a 
variance around that expectation. Its expectation is 
similar to the calculation of the snr: 



E 



.4 



J2k A ohkh* k /S k 

d 2 
"0 



= A . 



(46) 



Therefore, the most likely value we should deduce for 
the amplitude of the signal by the method of maxi- 
mum likelihood is the true amplitude of the incoming 
signal. 

The variance of this estimate A is also important and 
instructive. It is easy to show that 



E 



A 1 



- E 



.4 



df- 



(47) 



The standard deviation of the amplitude estimate is 
thus l/d . To understand this, calculate the relative 
accuracy of determining the amplitude. This is 



5A 
A 



1/rfo 
A 



1 

snr 



This is our first example of a general rule: the ac- 
curacy with which a parameter can be determined 
improves directly with the snr with which the signal 
is detected. If a signal is detected with snr of 10, 
then its amplitude can be determined to an accuracy 
of 10%. 

Thresholds must still be set with regard to the false- 
alarm probability. For the statistic G(A), suppose 
the threshold is set at Gq(A). Then, since the vari- 
ance is d\ , the false-alarm probability is 



P F (A) 



1 



-erfc 



(G (A)/V2d; 



If we set the threshold to a given multiple of the 
standard deviation dA of the filter output, then this 
probability will be independent of which parameter 
value A we have chosen, and the estimation of false- 
alarm probabilities will be the same as for the case 
with no parameters. 



2.5.3. Time-of-arrival as a parameter 



Most signals have structure in time that makes some 
times different from others. If a signal has a spe- 
cific beginning, we call that the time-of-arrival. The 
signal from the coalescence of two black holes does 
not start at a special time, but it finishes at the co- 
alescence time. A signal from a chirping binary may 
not start or finish at any time during the observation, 
but one can assign a time to it by measuring the time 
when the frequency reaches some particular fiducial 
frequency. In general, we will refer to these times as 
fiducial times. But the general problem is more often 
called the determination of the time-of-arrival of a 
signal. The fiducial time of a signal is as important 
a parameter as the amplitude: in astronomy we can 
almost never predict when a signal should arrive. 

The family of signals that arrive at different times 
but are otherwise identical is a simple family, just 
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obtained from one another by time-translation. We 
begin with a particular filter for one of them, say 
where the fiducial time is at t = 0. Let us call this 
q(t). Then the filter for a signal that has fiducial 
time t c = cAt is just q(t — t c ). (This assumes that 
the noise, which weights the filter, is the same at all 
times: stationary noise again.) Then the detection 
statistic for a signal with arbitrary fiducial time is 
(compare with Equation 



(48) 



This is just a convolution of the data with the fil- 
ter: the filter slides through the data, performing a 
sum at each step. If the statistic G(t c ) becomes ex- 
ceptionally large, then we should look at that time 
for the signal. Each value of t c represents a different 
filter, and the statistics of each one are as before. 

The MLE follows from constructing the logarithm of 
the likelihood function, which in this case is In A — 
G(t c ) — d,Q/2. Taking the derivative with respect to t c 
involves only differentiating the first term, since the 
second is constant. Finding the fiducial time there- 
fore amounts to looking for a local maximum in t c 
of the detection statistic. If this maximum exceeds 
the detection threshold, one has confidence that the 
signal arrived with that fiducial time. 

The appropriate threshold for G(t c ) must take into 
account the possibility that, among the N random 
values G(t c ) for the N possible times t c , at least one 
will reach the observed value of G even if there is 
no signal. That is, because we perform the filtering 
N times, the probability that at least one such filter 
will cross any given threshold even in the absence of 
a signal is TV times larger. The threshold must be 
raised to compensate for this. 

The estimation of the accuracy of the measurement 
of the fiducial time would occupy too much space 
here. I refer interested stude nts to a large iteraturc 
on this and related subjects: |Schutz 1991 



Flanagen 1995|; Nicholson fc Vccchio 1997 



Cutler & 



While it might seem that filtering for N different fidu- 
cial times would be a difficult task in a long obser- 
vation, it is actually much easier than it might seem, 
thanks to the efficiency of the fast Fourier transform 
algorithm. The use of this for determining fiducial 
times is described below. 



LOOKING FOR EXPECTED SIGNALS 
WITH LISA 



Now we turn to the specific problems presented by 
the signals we expect LISA to see. Coherent sig- 
nals, such as those from binaries, will be found by 
matched filtering. For these we must examine what 
the filter should look like, what its parameters must 
be, and how much computing power will be needed 
to cover a reasonable range of parameter values. It 
is convenient to distinguish between short-duration 
signals, which last for less than the duration of the 
observing period of a few years, and long-duration 
signals, which are present for the whole observation. 
We must also consider incoherent signals, which are 



the gravitational wave backgrounds due perhaps to 
binary systems in the Galaxy and in external galax- 
ies, or to gravitational waves from the Big Bang. 



3.1. Phase and Amplitude Modulation by Detector 

Motion 



Even short-duration signals will normally last long 
enough for the detector to move around the Sun by a 
significant amount during the observation, introduc- 
ing a Doppler shift in the observed frequency. Be- 
cause all coherent signals will be affected by the de- 
tector's motion, we begin by examining its effect. 

As LISA orbits the Sun, it moves toward and away 
from any fixed source. This produces a Doppler shift 
in the apparent frequency of signal; the shift depends 
on the position of the source in the sky. Moreover, the 
plane containing LISA's arms rotates once per year, 
and within that plane the orientation of the arms ro- 
tates as well. The effect is to scan any given source 
along a path through the antenna pattern of each 
of LISA's interferometers. This produces a modula- 
tion of the amplitude of the response of each detector 
to the amplitude of the wave. This modulation de- 
pends on the position of the source in the sky and 
on the polarisation of the wave. Any other space- 
based gravitational wave detector will inevitably ex- 
perience analogous effects, though they may differ in 
detail from those of LISA. Our discussion will also 
apply in its basics to ground-based detectors, which 
are carried through the Solar System by the motion 
of Earth. 

If we know, or think we know, the way the intrinsic 
frequency and amplitude of the detected waves de- 
pend on time, then we can use the position-depend- 
ence of the frequency modulation to infer the posi- 
tion of the source, and we can use the polarisation- 
dependence of the amplitude modulation to infer the 
intrinsic polarisation of the wave. These effects are 
therefore crucial in enabling LISA to extract useful 
information from its observations. 

The amplitude modulation depends on the position 
of the source on the sky, but for a long-lived source 
LISA's orbital motion performs an average over many 
positions, so the average amplitude of LISA's re- 
sponse to a source is only weakly dependent on the 
source's position. In the sensitivity diagram of LISA 
(Figure 2 of my lecture at this sch ool on sources 
for LISA; also in Bender et al. 1996 ), the detection 
threshold is drawn at the intrinsic amplitude a signal 
would have to have in order that LISA's average re- 
sponse would be 5a above the LISA noise. This level 
is actually at 5\/5 above the LISA raw noise curve. 
The reason for this is that the vertical scale in this 
figure is the intrinsic amplitude of the signal, not the 
response of the detector. 



The Doppler effect is more fruitfully regarded as a 
phase modulation, by which I mean that the detec- 
tor moves through the highly-regular wave pattern in 
a way that brings it across successive peaks (phase 
zero) in varying amounts of time. Using this picture, 
it is clear that there is significant phase modulation 
during a LISA observation only if the wavelength of 
the gravitational wave is comparable to or smaller 
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than the diameter of LISA's orbit, A cr it = 2 AU. 
This means that phase modulation is unimportant 
at gravitational wave frequencies below about / cr it = 
c/A cr it = 1 mHz. Below this frequency, all directi onal 
informa tion comes from amplitude modulation ( |Pe-| 
terseim et al. 1997 ). 



3.1.1. Analogy with the interference between two 

signals 



Phase modulation is the basis of LISA's ability to tell 
the direction of a source on the sky. To see how this 
works, it is helpful to begin with an analogy that is 
more familiar, but which is in essence the same effect: 
signal interference patterns. 

Most interesting sources for LISA lie below about 
10 mHz, for which LISA's orbit spans at most 10 
gravitational wavelengths. To see what this means 
for data analysis, consider two sources of gravita- 
tional waves with identical (and constant) frequen- 
cies, but in different positions on the sky. Radiation 
from the two sources creates an interference pattern 
in the solar system: there are places where they hap- 
pen to add to each other (constructive interference) 
and places where they subtract (destructive interfer- 
ence). 

If the wavelength is short compared to 1 AU, then 
there will be many such maxima and minima within 
the orbit of LISA, provided the waves arrive from 
rather different directions. If the waves arrive from 
almost the same direction, then the regions of con- 
structive and destructive interference will be sepa- 
rated by much more than a wavelength, and LISA 
will not be able to tell that there are two different 
sources. 

Now, a stationary detector would experience a coher- 
ent superposition of the two signals, which might in- 
terfere constructively or destructively, depending on 
the exact position of the detector. But as the detec- 
tor does not move, it will never know that there are 
two different sources: it will simply receive a single 
signal with a certain phase and amplitude. 

A moving detector like LISA, by contrast, will sample 
the interference pattern of the radiation and see that 
there are two different sources. This will only be 
possible if the sources are sufficiently far apart on 
the sky for the interference pattern to be "bumpy" 
within LISA's orbit. This effectively means that the 
sources must be separated by an angle greater than 
the ratio A gw /(1 AU), in radians. 

The power that a detector sees will be the power in 
the sum of the two signals. Source 1, in a location on 
the sky that we call 6\, produces a waveform h\(t; 6\) 
in the detector as it moves through the wave pattern, 
and source 2 (at the position we call 9 2 ) produces 
h 2 {t]6 2 ). The power at time tj is (suppressing the 
angular arguments here) 

Pj = [^ife) + h 2 {t ] )f 

= M*j)f + [h 2 (tj)} 2 + 2h 1 (t j )h 2 (t j ).(49) 

The first two terms are constant on time-scales longer 
than the period of the wave (at most a few hours in 



the case of LISA) , but the final term is the one that 
will change on the orbital time-scale because of the 
motion of the detector: 

Interference term = hi(tj\ 0x)h 2 (tj\ 6%). (50) 

This is the effect of the interference. 



3.1.2. Filtering with the wrong filter: 
direction-finding with LISA 



Now, in reality there will not be two sources with 
identical frequencies. But this thought experiment 
helps us to understand how LISA can use phase mod- 
ulation to determine the direction to a single source. 
In the data analysis, we will pass the data through 
a filter that represents the waveform expected from 
a source in a particular direction. Let us suppose 
that this is in the position of source 1 above, so that 
we use h\{t;Q\) as the filter. (For simplicity, let us 
assume white noise, where the filter is the expected 
signal.) Now suppose there really is a gravitational 
wave of this frequency in the data, but that it comes 
from the position of source 2. Then the data will 
contain the signal h 2 (t; 9 2 ). When we filter the data, 
the expectation of the output of the filter is given by 
Equation (||): 



G 



>i(*j;0i)&2(*,-;02). 



(51) 



But this is just the interference term in Equation (p9[). 
Therefore, the problem of finding a signal with a ni- 
ter that assumes the wrong angular position is just 
the same as seeing the interference pattern between 
the two signals. This means that we will have a good 
response if the position assumed by the filter and 
the position of the true signal are close enough on 
the sky. If they are not, then there will be a poor 
response. This effect gives LISA its directional ca- 
pabilities, but at the price of having to perform the 
filtering for many independent locations on the sky. 

This means that the angular resolution of LISA when 
looking at a weak source, near its noise level, will be 
at best of order 1/10 radians. However, this resolu- 
tion improves essentially directly with improving snr, 
because for a strong signal one can notice small vari- 
ations in the output of the filter when its position 
is slightly changed. Therefore, for a coalescing black 
hole binary with a signal-to-noise ratio of 10 4 , the 
angular resolution could in principle be as good as 
10" 5 rad, or a few arcseconds. This is, however, un- 
realistic, for three reasons: first, such sources spend 
most of their time (and their accumulated signal) at 
sub-mHz frequencies, where there is no detectable 
phase modulation; second, there is a correlation of 
errors between position variables and other parame- 
ters that degrades the accuracy; and third, noise from 
galactic binaries may degrade both the overall signal- 
to-noise ratio and t he accuracy of positio n-sensing. 
Realistic estimates ( [Cutler fc Vecchio 1997 ) give typ- 
ical angular accuracies of about 1 degree for massive 
black-hole coalescences. For signals at a constant fre- 
quency, like binaries, the situation can be much bet- 
ter, with accuracies approaching the arcminute scale. 
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3.2. Analysis of Data for Short-Duration Signals 



Some of LISA's most exciting signals, those from coa- 
lescing black-hole binaries, will have lifetimes shorter 
than the duration of the experiment if they enter the 
LISA frequency window at all. Therefore one of the 
most elementary parameters associated with the sig- 
nal is the time of coalescence, t c . This is a signal 
that requires a fiducial-time parameter, as discussed 
earlier. 



3.2.1. Construction of coalescing binary templates 



The merger of two black holes is a very complicated 
dynamical event, and one might worry that it will 
not be possible to pick these events out effectively be- 
cause our filters would be crude. This is not the case, 
however, because the greatest part of the snr of such 
an event comes from the gravitational radiation emit- 
ted by the decaying orbit, before the merger event. 
The estimates of LISA's sensitivity to mergers have 
been made assuming that all the snr comes from the 
orbit. Any further radiation from the merger event, 
which is really the goal of the observation, will be a 
bonus insofar as it contributes to the detectability of 
the event. 

But with such good snr from the orbital radiation, 
it will be possible to predict accurately the fiducial 
time, i.e. the moment when we expect to receive 
the radiation from the merger of the two black holes. 
Even if that radiation is relatively weak, we may then 
be able to study it in detail. By the time that LISA 
flies, one hopes that the numerical simulation of black 
hole mergers now being developed to run on super- 
computers will produce accurate templates, so that 
we can study the details of the merger as a test of 
black-hole theory in general relativity. 

Cutler (private communication) has pointed out an- 
other bonus of having a good coalescence time. If the 
data analysis can be done fast enough to predict the 
coalescence before it occurs, then astronomers can 
be alerted to the imminent event and given a crude 
position. It may well be that the merger will be ac- 
companied by some kind of optical or X-ray display. 

The construction of templates even for the orbital 
radiation is itself a difficult task. The orbital sys- 
tem is approximated as two point masses orbiting in 
general relativity. The holes may make several hun- 
dred orbits while LISA watches them, before they 
merge. During this time, the template must not 
get out of phase with the orbit, so the orbital de- 
cay must be tracked accurately. The simple Newto- 
nian and quadrupole approximations that suffice to 
give factor-of-two accuracy for most sources (see my 
first lecture at this school) do not give the required 
accuracy here. An approximation scheme called the 
post-Newtonian method must be used, and it must 
be pushed to very high order to get good results. This 
is being done now in order to provide good templates 
for ground-based observations of neutron-star bina- 
ries, and the same work will be applicable to LISA's 



sources. See Cutler ct al. 1993; Blanchet et al. 1995 



Blanchet 1997 lor more details 



ferent coalescence time is made much more tractable 
by using the Fast Fourier Transform algorithm. This 
is so important that it must be described here. 



3.2.2. Filtering using the Fast Fourier Transform: 
benefits and pitfalls 



The assumption of stationary noise leads us naturally 
to use DFT's for our analysis, since (as we have seen) 
the noise at different frequencies is uncorrelated. But 
there is an additional reason for using DFT's, which 
is that the convolutions needed for filtering for differ- 
ent fiducial times in Equation ( [48|) are most rapidly 
done using the well-known Fast Fourier Transform 
(FFT) algorithm. 

An expression like the standard DFT, 
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would require N multiplications and N additions for 
the sum that produces each element of the transform. 
Since there are N values of k for which this needs to 
be done, there are of order N 2 operations to compute 
the DFT in a straightforward manner. The FFT al- 
gorithm, by making use of symmetries of the unit 
complex numbers exp(— 2-nijk/N), reduces this to a 
number of order iVlog 2 N. For a data set containing 
N = 10 8 data values, as LISA will have (see below), 
this is a speedup of a factor of more than one million. 

The same is true of the convolution and correlation. 
If we want to evaluate a filter with a variable fiducial 
time, 



G(t c ) 



3 



x(tj)q(tj - t c ), 



then we must again perform N operations for each 
value of t c , and there will be TV possible values oi_L. 
But if we use the correlation theorem, Equation (|19| ) , 
we have only one multiplication for each value ofthe 
Fourier transform of the correlation, 



G k 



As remarked earlier, the job of filtering for each dif- 



In addition to this we have to perform three Fourier 
transforms: two to obtain x and q (although the lat- 
ter may already be available from Equation (|22|)), 

and a third to go back from G to G. Adding these 
up still gives an operation count of order N log 2 N 
rather than iV 2 . 

So matched filtering for signals with unknown fiducial 
times is usually done with FFT's. However, there 
is a subtle problem that the user must beware of. 
This is the problem of wraparound, or end effects in 
the correlation. The correlation of the data with the 
moving filter, 

G(t c )=J2 X ( t 3)Q(t3-tc), 



is only well-defined if tj — t c is within the range of 
times defined for the filter q, which cannot be larger 
than TV. Therefore, for any t c there will be a range of 
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values of j for which q(tj — t c ) is not defined. When 
the filter q is of short duration compared to the length 
of the data set, the values of q will be zero at almost 
all these points, so the correlation can be performed 
without error. But there will always be some values 
of t c , usually at the beginning or the end of the range 
of times, where the summation reaches the end of the 
range for tj without reaching the end of the duration 
of the filter q. In this case, the correlation is not well- 
defined. In fact, the Fourier method of calculating 
the correlation makes everything periodic outside the 
range of data values, so it wraps the filter around to 
the other end and convolves the remaining non-zero 
part of the filter with data from the other end of 
the range of observation. These will clearly not be 
correct values and must either be ignored or corrected 
in some way. 



3.2.3. Computational demands of filtering 



Suppose there is a single family of waveforms that 
one wants to look for, and suppose we have 3 years 
of LISA data to look in. The data need only be 
sampled effectively at about 1 Hz, since the upper 
limit on the LISA band is well below 1 Hz. So in 3 
years there will be 10 8 data samples. To search the 
whole data set for a single signal of unknown fiducial 
time would require of order 10 10 multiplications. On 
a modern high-performance workstation (not even a 
parallel supercomputer), with a memory of 1 GB and 
a speed of 1 Gflop, this would take only 10 seconds. 
On computers that will be cheap by the time LISA 
flies, this will be done in a small fraction of a second. 

Of course, there will be many different waveforms, 
not just one. Coalescing binary black holes could be 
visible to LISA over a range of chirp masses (per- 
haps requiring filtering for several thousand different 
chirp mass values), over the whole sky (perhaps 10 3 
separate patches of 0.1 rad on a side), and further pa- 
rameters to describe their spins and so on. Even so, a 
single workstation today could do the entire data set 
in a time comparable to the time it took to acquire 
the data. 

But coalescing binaries are not as big a family as the 
family of waveforms that might be produced by com- 
pact masses falling into massive black holes. Here 
the orbits will not be circular, and the interaction 
between the orbit of the compact object and the spin 
of the hole will lead to very complex waveforms ex- 
tending for thousands of observable orbits. So far we 
do not have very reliable estimates of the size of the 
parameter space that will need to be searched, but it 
is likely to be enormously larger than that for black- 
hole coalescences. Even so, assuming a doubling of 
computer speed every 1.5 years between now and, 
say, 2010 (to pick an optimistic time for data analy- 
sis to begin), we could afford the parameter space to 
be 1000 times larger and we could still do the search 
with 4 workstations. 

A more serious, and so far unexplored, problem asso- 
ciated with searching for compact masses may be the 
nearly-chaotic nature of their orbits around a rapidly 
spinning black hole. A small change in the initial 
data for the orbit may make a very large change in 
the nature of the orbit — particularly its plane - 



many orbital periods later. The key requirement of 
matched filtering, that it must keep in phase with 
the real orbit, may limit filters to being built out of 
pieces that are only a few periods long, and which 
must be joined together in unpredictable ways. It 
may be hard to fit real signals accurately in this way. 



3.3. Looking for Galactic Binaries 

For a long-lived signal like that expected from a bi- 
nary system, there is no real fiducial time, and the 
problem of sliding a filter through the data does not 
arise. Instead, we would just expect to do a single 
Fourier transform of the data and look for a peak at a 
fixed frequency, correcting perhaps for a small chirp 
(change in the frequency during the observation). 

This is too simplistic, however, because of the in- 
duced phase modulation that we discussed earlier. 
The simplest way to correct for this would be to try 
to reconstruct the data that a detector at rest in a 
single location (such as the barycentre of the Solar 
System) would observe. This can be done from the 
data registered by the real, moving detector only if 
we assume a particular location on the sky for the 
source. As discussed above, this must be done inde- 
pendently for something like 10 3 different locations. 
Once a signal has been identified, its position can be 
refined to a smaller area, inversely proportional to 
the square of the snr, but for the initial search the 
sky can be divided into about 1000 patches. 

Once the reconstructed data set is obtained, then 
an FFT will reveal any constant-frequency binaries. 
Any chirping binaries would have to be looked for 
with other techniques, either matched filtering or 
time-domain resampling. Even so, the amount of nu- 
merical work involved would be much less than that 
involved in searching for compact objects falling into 
massive black holes. 



This is a great contrast to the situation for ground- 
based detectors. Because they operate at a higher 
frequency, there are many more independent loca- 
tions on the sky, and the parameter space that needs 
to be searched for continuous sources is enormous 
( |Schutz 1991] ; |Brady et al. 1997| ). 



For the purposes of predicting what LISA can see, 
the LISA team has adopted a threshold of 5tr for 
constant-frequency sources. If the noise amplitude 
is Gaussian, then the pdf of the power spectrum 
is a simple exponential (since it is the sum of the 
squares of two Gaussian-distributed Fourier ampli- 
tudes). If we set a threshold of 5 on the spectral 
amplitude (which is what is plotted in the LISA sen- 
sitivity diagram), then this is a threshold of 25 on 
the power spectrum. The false-alarm probability as- 
sociated with this threshold is of order 10 -11 . Since 
the frequency resolution in a 1-year observation is 
3 x 10~ 8 Hz, and the LISA bandwidth is no larger 
than 0.1 Hz, there are no more than 3 x 10 6 indepen- 
dent elements in the DFT. The probability that at 
least one frequency component will reach 5er is there- 
fore less than 10 -4 . This seems like a safe threshold 
to set. 

As part of its observation of a binary system, LISA 
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will determine the polarisation of the signal. This 
will tell us the angle of inclination of the binary orbit 
to the line of sight to the system. This is an impor- 
tant parameter for modelling binaries, and one of the 
hardest to obtain by optical observations. Unless the 
system undergoes eclipses, radial-velocity measure- 
ments of the orbits of binary stars cannot distinguish 
between a relatively edge-on system or one seen from 
near its pole. If the latter orientation were correct, 
then the true velocities in the system would be much 
larger than the projected radial velocities observed 
from a spectrum, and the masses of the stars would 
have to be correspondingly higher. By contributing 
the actual angle of inclination, LISA will help as- 
tronomers pin down the masses of many known bi- 
naries, as well as of binaries that LISA identifies for 
the first time. The resulting information should have 
many implications for improving our understanding 
of binary evolution, and this will lead to improved 
predictions of black-hole formation rates, supernova 
rates, and so on. 



3.4. Detecting a Stochastic Background 

LISA will find a random background of gravitational 
waves if the noise in the waves exceeds the noise in the 
instrument. Unlike ground-based detectors, which 
will use cross-correlation techniques to dig below in- 
strumental noise, LISA is an isolated detector and 
can only see noise sources above its intrinsic noise. 
(There are two LISA interferometers, but correlating 
them has limited effectiveness: see the next section.) 

For this to work, one must have confidence that one 
understands the instrumental noise. This can be 
tested by various consistency checks on the space- 
craft, but in the end it is largely a matter of confi- 
dence in our understanding of the instrument. 

If the background is due to binary star systems, 
there are additional consistency checks one can make. 
First, the amplitude of noise should vary with time 
as LISA turns in its orbit and presents different parts 
of its antenna pattern to the Galaxy. Moreover, this 
is a confusion-limited background consisting of many 
individual stars. Some members of this population 
will be unusually close to LISA and/or at an unusu- 
ally high frequency, so that they will stand out from 
the background and be studied individually. From 
the statistics of the individual binaries it will be pos- 
sible to infer where the confusion background will 
be found, and thereby to identify it as gravitational 
wave noise rather than instrumental noise. But if the 
stochastic waves are cosmological and isotropic, then 
we have no such consistency checks in the data. 



3.5. Using Both LISA Interferometers 

Ground-based gravitational wave detectors rely on 
independent observations of events by two or more 
detectors ("coincidences") to gain confidence in the 
detection and to increase the information they obtain 
from an event. LISA can similarly gain from using 
data from both its interferometers. 

There is are two important differences between LISA 
and ground-based interferometer networks. First, be- 



cause LISA's two interferometers share a common 
arm, one cannot assume the noise in them is inde- 
pendent. Therefore, when working close to the de- 
tection limit, one should not expect to gain much 
confidence in seeing an event in both data streams. 
(Of course, if an event is seen in only one stream, and 
it lasts long enough for LISA to change its orienta- 
tion and hence its polarisation, then one can prob- 
ably assume the event was a noise event.) Second, 
LISA's interferometers are in the same location, so 
there is no time-difference between signals in the two 
instruments that could be used (as on the ground) 
for direction-finding. 

Despite these differences, there is usefully different 
information in the two data streams. From the point 
of view of their gravitational wave responses, the two 
instruments are independent. That is not to say that 
they are orthogonal in some sense, but simply that 
any given gravitational wave arriving along a direc- 
tion perpendicular to the plane of LISA can be ex- 
pressed as a linear combination of the two detector 
responses. As LISA rotates, therefore, it can use the 
two different responses to define the polarisation and 
to look for changes in the intrinsic polarisation of the 
signal. 

In practice, despite the fact that a single LISA instru- 
ment can sense polarisation (by watching the ampli- 
tude change as the detector rotates) , the contribution 
from the second instrument will be very important. 
In their study of signal s from coalescing black holes, 
Cutler & Vecchio 1997 found that the second instru- 
ment helped to make a much cleaner untangling of 
the various parameters, so that the accuracy of de- 
termining the distance to the source improved enor- 
mously. 

I believe that the second detector is likely to be very 
important in detecting waves from compact objects 
spiralling into massive black holes. Because of the 
coupling of the orbital angular momentum to the spin 
of the hole, the plane of the orbit can change dramat- 
ically during the event, and this will change the po- 
larisation of the wave. This can happen on timescales 
much less than the orbital timescale of LISA, so that 
LISA cannot use its changing orientation to sense 
polarisation. Once the source's direction has been 
identified using phase modulation, then the polari- 
sation will be determined by the two detectors op- 
erating together. This will make the determination 
of other parameters much cleaner and easier: the in- 
trinsic amplitude of the signal, the distance to the 
source, and the direction. Then we can look for an 
association with a galaxy or cluster of galaxies. 

When sensing a stochastic background, two interfer- 
ometers could in principle do much better than one: 
by cross-correlating the noise in the two detectors, 
one eliminates independent noise sources and finds 
only the correlated noise, which could come from 
gravitational waves that stimulate both detectors. 
This is how ground-based detectors will search for 
a background at amplitudes far below their individ- 
ual noise levels ( Flanagan 199S ). But this does not 
work for LISA: sharing a common arm, the two in- 
terferometers will automatically have correlated in- 
strumental noise. Coincidentally, for the 60° angle 
between LISA's arms, the signal-to-noise ratio after 
correlation of the two instruments will be the same 
as in each individual instrument. So if a gravita- 
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tional wave background is hidden in an individual 
data stream by the instrumental noise, and if that 
noise is at the same level in all arms, then the back- 
ground will be just as hidden in the correlation of the 
two outputs. 

However, the two interferometers can in principle 
help to determine whether the background is isotropic 
or not, i.e. to distinguish between a cosmological 
background (isotropic) and a background due to bi- 
nary systems (stronger in directions toward the plane 
of the Galaxy). As LISA moves, it presents a differ- 
ent part of the antenna pattern to the Galaxy for 
one detector than for the other. If the detector noise 
is dominated by the Galactic background, then the 
noise should go up and down in the right way. The 
second detector would provide confirmation of this 
if its noise went up and down as well, but with a 
different phase. 



4. Conclusions 



I have introduced the elements of signal analysis as 
they are used for gravitational wave data from inter- 
ferometers. The basic techniques are the same for 
ground-based detectors as for LISA, but the data set 
of LISA will be much smaller (because of the lower 
observing frequency) and the data analysis problem 
will not be so demanding. However, there are some 
problems, as yet unsolved, that are unique to LISA. 
Most difficult is the question of how to devise suitable 
filters for the signals from compact stars falling into 
massive black holes in the centres of distant galax- 
ies. This is one of the most likely sources for LISA 
as well as one of the most important in terms of the 
fundamental information that the signals contain. It 
will be important, as LISA develops, to find a suit- 
able filtering method that does not lose many of these 
events. 

There is one kind of source I have not discussed here, 
and that is the one we don't expect. Given the kinds 
of sources that inhabit the LISA frequency band, any 
exotic process that produces signals we did not ex- 
pect will not only be interesting: it might be revo- 
lutionary. It will therefore be an important part of 
the development of a data analysis system for LISA 
to build into it the capability of responding to un- 
expected events. Here the notion of an event must 
be vague, so the event must stand up above noise in 
an unmistakable way. But the noise need not be the 
raw time-series noise: general families of filters, such 
as time-frequency methods (chirp filters, wavelets, 
bispectrum), nonlinear adaptive methods, and other 
general approaches can well be used to identify unex- 
pected events. The important consideration here are 
that these methods must be defined ahead of time, 
and their statistics must be understood. Then it will 
be possible to claim that a statistically significant 
event was real. 

It is possible that by the time LISA flies, gravitational 
waves will have been detected by ground-based detec- 
tors. If this is not the case, then LISA needs to make 
reliable detection a high priority, selecting filtering 
methods and thresholds to minimize the chance of 
a false alarm. The same will be true when looking 
for rare but strong events, such as black-hole coales- 



cences in the centres of distant galaxies. When LISA 
turns to the study of sources rather their mere detec- 
tion, and is dealing with numerous populations like 
the galactic binaries and the compact objects falling 
into massive black holes, then the criteria become a 
little different, and LISA can begin observing closer 
to its detection threshold. In the case of the galactic 
binaries that generate confusion noise, LISA will go 
below the conventional detection threshold in order 
to see how the nearby members of this class blend 
into the confused background. 

LISA's data analysis will probably not be computa- 
tionally demanding, but it will nevertheless require 
care and respect for the statistics of detection and 
estimation. LISA is not likely to be followed soon by 
another mission, so its observations will be the only 
time the low-frequency gravitational wave window is 
opened for some time. The task of the data analysis 
team will be to give us the sharpest possible vision in 
this window. What we see through that window has 
the possibility of fundamentally changing our view of 
Nature. 
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